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Abstract — Network Utility Maximization (NUM) provides a 
key conceptual framework to study reward allocation amongst 
a collection of users/entities across disciplines as diverse as 
economics, law and engineering. In network engineering, this 
framework has been particularly insightful towards understand- 
ing how Internet protocols allocate bandwidth, and motivated 
diverse research efforts on distributed mechanisms to maximize 
network utility while incorporating new relevant constraints, on 
energy, power, storage, stability, etc., e.g., for systems ranging 
from communication networks to the smart-grid. However when 
the available resources and/or users' utilities vary over time, 
reward allocations will tend to vary, which in turn may have a 
detrimental impact on the users' overall satisfaction or quality 
of experience. 

This paper introduces a generalization of NUM framework 
which explicitly incorporates the detrimental impact of temporal 
variability in a user's allocated rewards. It explicitly incorporates 
tradeoffs amongst the mean and variability in users' reward 
allocations, as well as fairness. We propose a simple online 
algorithm to realize these tradeoffs, which, under stationary 
ergodic assumptions, is shown to be asymptotically optimal, i.e., 
achieves a long term performance equal to that of an offline 
algorithm with knowledge of the future variability in the system. 
This substantially extends work on NUM to an interesting class of 
relevant problems where users/entities are sensitive to temporal 
variability in their service or allocated rewards. 



I. Introduction 

Network Utility Maximization (NUM) provides the key 
conceptual framework to study (fair) reward allocation among 
a collection of users/entities across disciplines as diverse as 
economics, law and engineering. [24] introduces NUM by 
discussing the problem of fair allocation of a fixed amount of 
water c to N farms. The amount of water Wi allocated to the 
ith farm is a resource which yields a reward of n = fi(wi) 
from the ith farm. Here, /j is a concave function mapping 
allocated water (resource) to yield (reward), and these can 
differ across farms. They point out that the allocation max- 
imizing J2kkn r i * s a rewar d (utility) maximizing solution 
to the problem. Fairness can be imposed on the allocation by 
changing the objective of the problem to J2i<kn ^( r «) f° r 
an appropriately chosen concave function U. Now, suppose 
that we have to make the allocation decisions periodically 
to respond to time varying water availability (ct). and utility 
functions (fi,t) t - Then, subject to the time varying constraints, 
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one could maximize (see for e.g., [28], [15]) 

U(7i) (1) 
i<;<at 

to obtain a resource allocation scheme which is fair in the 
delivery of time average reward r. 

In network engineering, NUM framework has served as a 
particularly insightful setting to study (reverse engineer) how 
the Internet's congestion control protocols allocate bandwidth, 
how to devise schedulers for wireless systems with time vary- 
ing channel capacities, and also motivated the development of 
distributed mechanisms to maximize network utility in diverse 
settings including communication networks and the smart grid, 
while incorporating new relevant constraints, on energy, power, 
storage, power control, stability, etc. 

When the available resources/rewards and/or users' utilities 
vary over time, reward allocations amongst users will tend 
to vary, which in turn may have a detrimental impact on the 
users' utility or perceived service quality. In fact, temporal 
variability in farm water availability can even have a negative 
impact on crop yield (see [26]). This motivates modifications 
of formulations with objectives such as the one in (1) to 
account for this impact. 

Indeed temporal variability in utility, service, rewards or 
associated prices are particularly problematic when humans 
are the eventual recipients of the allocations. Humans typically 
view temporal variability negatively, as a sign of an unreliable 
service, network or market instability, or as a service which 
when viewed through human's cognitive and behavioral re- 
sponses can translate to a degraded Quality of Experience 
(QoE). This in turn can lead users to make decisions, e.g., 
change provider, act upon perceived market instabilities, etc., 
which can have serious implications on businesses and engi- 
neered systems, or economic markets. For problems involving 
resource allocation in networks, [3] argues that predictable or 
consistent QoS is essential and even points out that it may be 
appropriate to intentionally lower the quality delivered to the 
user to a level that can be sustained. 

For a user viewing a video stream, variations in video 
quality over time have a detrimental impact on the user's QoE, 
see e.g., [30], [14], [22]. Indeed [30] even points out that 
variations in quality can result in a QoE that is worse than 
that of a constant quality video with lower average quality. 
Furthermore, [30] proposed a metric for QoE given below 
which penalizes the standard deviation in quality over time: 



Mean Quality — k ^/Temporal Variance in Quality 
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where k is an appropriately chosen positive constant. [8] 
and [29] argue that less variability in the service processes 
can improve customer satisfaction by studying data for large 
retail banks and major airlines respectively. Aversion towards 
temporal variability is not just restricted to human behavior, 
for instance, see [21]. Also, variability in resource allocation 
in networks can lead to burstiness which can degrade network 
performance (see [5], [23]). These examples illustrate the need 
for extending the NUM framework to incorporate the impact 
of variability. 

This paper introduces a generalized NUM framework which 
explicitly incorporates the detrimental impact of temporal vari- 
ability in a user's allocated rewards. We use the term rewards 
as a proxy representing the resulting utility of, or any other 
quantity associated with, allocations to users/entities in a sys- 
tem. Our goal is to explicitly tackle the task of incorporating 
tradeoffs amongst the mean and variability in users' rewards. 
Thus, for example, in a variance-sensitive NUM setting, it may 
make sense to reduce a user's mean reward so as to reduce 
his/her variability. As will be discussed in the sequel, there are 
many ways in which temporal variations can be accounted for, 
and which, in fact, present distinct technical challenges. In this 
paper, we shall take a simple elegant approach to the problem 
which serves to address systems where tradeoffs amongst the 
mean and variability over time need to be made rather than 
systems where the desired mean (or target) is known (like in 
minimum variance control, see [1]), or where the issue at hand 
is minimization of the variance of a cumulative reward at the 
end of a given (e.g., investment) period. 

To better describe the characteristics of the problem we 
introduce some preliminary notation. We shall consider a 
network shared by a set Af of users (or other entities) where 
\Af\ = N denotes the number of users in the system. Through- 
out the paper, we distinguish between random variables (and 
random functions) and their realizations by using upper case 
letters for the former and lower case for the latter. We use bold 
letters to denote vectors, e.g., a = (<Zj : i € Af). We let (a) 1 . T 
denote the finite length sequence (a(t) : 1 < t < T). Let R 
(R+) denote the set of (non-negative) real numbers. For any 
function U on R, let U denote its derivative. 

Let Ti{t) represent the reward allocated to user i at time 
t. Then r(t) = (ri(t) : i 6 Af) is the vector of rewards to 
users Af at time t, and (r) x . T represents the rewards allocated 
over time t = 1, . . . , T slots to the same users. We assume 
that reward allocations are subject to time varying network 
constraints, 



c*(r(i))<0 for t = l,...,T, 



where each c* : M. N — > R is a convex function, thus 
implicitly defining a convex set of feasible reward allocations. 
To formally capture the impact of the time-varying rewards on 
users' QoE consider the following offline convex optimization 



problem OPT(T): 
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max 

( r )l:T 
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User i's QoE 



\ 



-UY (Var T ((r,) 1:T )) 



\Mean Reward 



Penalty for Variability 



subjectto c t (r(t)) < 0, r(t) > V t e {1, T} , 
where for each i E Af, 

Var T (( n ) 



i:T) = ^£(^)-i£r i (r) > ) 

t=l \ T=l ) 



We refer to this as an offline optimization because time- 
varying time constraints (c t )i-T are assumed to be known. 
Here, we allow increasing functions (Uf, UY) ieJ ^ that ensure 
that the above optimization problem is convex. For user i, the 
argument of the function Uf is our proxy for the user's QoE. 
Thus, the desired fairness in the allocation of QoE across the 
users can be imposed by appropriately choosing (Uf) ieJ ^. 
Note that the first term j^ (=1 rj(i) in user i's QoE is 
the user's mean reward allocation, whereas the presence of 
the temporal variance function Var T (.) in the second term 
penalizes temporal variability in reward allocation as it is the 
time average of (squared) deviation of the reward allocation 
from its mean. Further, flexibility in picking ipY\ £j ^ allows 
for several different ways to penalize the variability. So, one 
can in principle have a variability penalty that is convex 
or concave in variance. Hence, we see that the formulation 
OPT(T) allows us to realize tradeoffs among mean, fairness 
and variability associated with the reward allocation by appro- 
priately choosing the functions (Uf ,UY) ^. 

A. Main contributions 

The main contribution of this paper is the development 
of a simple asymptotically optimal online algorithm, Adap- 
tive Variability-aware Reward allocation (AVR), for realiz- 
ing mean-variance-fairness tradeoffs. The algorithm requires 
almost no statistical information about the system, and its 
characteristics are as follows: 

(i) in each slot, ct is revealed, and AVR allocates rewards by 
solving optimization problem OPT-ONLINE given below: 

Uf) fe(t)) (n {UY) (vi(t)) (r< - mi (t)Y ' 



subject to c t (r) < 0, r > 0, 

where ej(i) = rrii(t) — (vi(t)) for each i 6 Af is an 
estimate of the user's QoE based on estimated means and 
variances; and, 

(ii) it updates (vector) parameters m(i) and v(t) to keep 
track of the mean and variance respectively associated with 
the reward allocation under AVR. 

Under stationary ergodic assumptions on the time-varying 
constraints (Ct) t , we show that AVR is asymptotically opti- 
mal, i.e., achieves a performance equal to that of the offline 
optimization OPT(T) introduced earlier as T — >• oo. This is a 
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strong optimality result, which at first sight may be surprising 
due to the nature of dependency of Var T (.) (in the objective 
of OPT(T)) on reward allocations over time and the time 
varying nature of the constraints (ct) t - The key idea is to 
exploit the characteristics of the problem, by keeping online 
estimates for the relevant quantities associated with users' 
allocations, e.g., the mean and variance which over time are 
shown to converge, and this eventually enables the online 
policy to produce allocations corresponding to the optimal 
stationary policy. Proving this result is somewhat challenging 
as it requires showing that the estimates based on allocations 
produced by our online policy, AVR, (which itself depends on 
the estimated quantities), will converge to the desired values. 
To our knowledge this is the first attempt to generalize the 
NUM framework in this direction. We contrast our problem 
formulation and approach to some of the past work in the 
literature addressing 'variability' minimization, risk-sensitive 
control and other MDP based frameworks in the related work 
below. 

B. Related Work 

Network Utility Maximization (NUM) provides the key 
conceptual framework to study how to fairly allocate rewards 
amongst a collection of users/entities. [24] provides a network- 
centric overview of NUM. All the work on NUM including 
several major extensions (for e.g., [12], [28], [27], [20] etc.) 
have ignored the impact of variability in reward allocation. 
Our work [10] is one of the first to tackle network resource 
allocation incorporating the impact of variability explicitly. In 
particular, we address a special case of the problem studied in 
this paper that only allows for linear functions (iff , UY) ieJ ^-, 
and an asymptotically optimal online resource allocation al- 
gorithm for a wireless network supporting video streaming 
users is proposed. The algorithm proposed and analyzed in this 
paper can be viewed as falling in the class of gradient based 
algorithms such as the ones proposed in [28] and [15]. How- 
ever, our approach for proving asymptotic optimality of such 
simple online gradient based schemes for 'convex' resource 
allocation problems (with objectives involving certain types of 
time averages) can be viewed as an important generalization 
of the approaches in [28] and [11]. In [28], the focus is on 
objectives such as (1), and does not allow for the addition of 
temporal variance to the objective. The approach in [11] relies 
on the use of results on sensitivity analysis of optimization 
problems, and only allows for linear (Uf) . ^ and concave 

WW 

Adding a temporal variance term in the cost takes the 
objective out of the basic dynamic programming setting (even 
when (Uf,UY) ieJ ^ are all linear) as the overall cost is 
not decomposable over time, i.e., can not be written as a 
sum of costs each dependent only on the allocation at that 
time- this essentially is what makes sensitivity to variability 
challenging. For risk sensitive decision making, MDP based 
approaches aimed at realizing optimal tradeoffs between mean 
and temporal variance in reward/cost were proposed in [7] 
and [25]. While they consider a more general setting than 
ours where actions can even affect the process (Ct) t , the 



approaches proposed in these works suffer from the curse 
of dimensionality as they require solving large optimization 
problems. For instance, the approach in [7] involves solving a 
quadratic program in the (typically large) space of state-action 
frequencies. Note that these approaches for risk sensitive 
decision making are different from ones focusing on the 
variance of the cumulative cost/reward such as the one in [17]. 

Variability or perceived variability could be measured in 
many different ways, and temporal variance considered in this 
paper is one of them. One could also 'reduce variability' using 
a minimum variance controller (see [1]) where we have certain 
target reward values fixed ahead of time and big fluctuations 
from these targets are undesirable. Note however that in using 
this approach, we have to fix our targets ahead of time, and 
thus lose the ability to realize tradeoffs between the mean 
and variability in reward allocation. One could also measure 
variability using switching costs like in [18], which considers 
the problem of achieving tradeoffs between average cost 
and time average switching cost associated with data center 
operation, and proposes algorithms with good performance 
guarantees for adversarial scenarios. The decision regarding 
how to measure variability should ultimately be based on the 
application setting under consideration. 

C. Organization of the paper 

Section II introduces the system model and assumptions. In 
Section III, we present and study the offline formulation for 
optimal variance sensitive joint reward allocation OPT(T). We 
start Section IV by formally introducing our online algorithm 
AVR and present a convergence result associated with it. This 
in turn serves as the basis for establishing the asymptotic 
optimality of AVR. Section V is devoted to the proof of AVR's 
convergence. We conclude the paper in Section VI. Proofs for 
some of the results presented in these sections are discussed 
in the appendices. 

II. System model 

We consider a slotted system where slots are indexed by 
t G {0, 1, 2...}, and the system serves a fixed set of users N 
and let N = \J\f\. 

We assume that rewards are allocated subject to time 
varying constraints. The reward allocation r(t) G in slot 
t is constrained to satisfy the following inequality 

ct (r(t)) < 0, 

where c t denotes the realization of a randomly selected func- 
tion Ct from a (arbitrarily large) finite set C of real valued 
maps on R+. We make the following assumptions on these 
constraints: 

Assumptions C1-C5 (Time varying constraints on rewards) 

C.l Let (Ct) t be a stationary ergodic process. 

C.2 'Zero allocation' is always feasible, i.e., for any c G C, 

c(0) < 0. 

C.3 Feasible region corresponding to each constraint is 
bounded: there is a constant r max > such that for any c G C 
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and r 6 satisfying c (r) < 0, we have fj < r max for each 
i G Af. 1 

C.4 Each function c € C is convex and differentiable on an 
open set containing [0,r roax ] . 

C.5 There is a strictly feasible allocation in the interior of 
feasible region corresponding to each constraint. That is, for 
each c G C, there is some (small vector) r^ eas (c) e (0,00)^ 
such that c (rf eas (c)) < 0. 

Let (71- (c) : c € C) denote the marginal distribution associated 
with the stationary ergodic process (Ct) t , and let C* denote 
a random constraint with distribution (n(c) : c G C). 

As pointed out in C.l, we model the evolution of the 
constraints over time as a stationary ergodic process. We 
view (Ct) t as a random process where each Ct can be inter- 
changeably viewed as a random function or an index selected 
randomly from a finite set C. Condition C.2 requires that the 
zero allocation is feasible under all constraints picked from C. 
If condition C.3 holds, then we can upper bound any feasible 
allocation under any constraint in C using r max ljv where In 
is the N length vector with each component equal to one. 
Condition C.4 ensures that the feasible sets are convex, and the 
differentiability requirement simplifies the exposition. Finally, 
condition C.5 is a technical condition useful in studying the 
optimization problem OPT(T). An easy way to check this 
condition is verifying the following condition: 
C.5a There is some (small) constant <5f eas > such that 
c (<5f eas ljv) < for each ceC. 

Next we discuss the assumptions on the functions (Z7,- ) igjV 
used in imposing variability penalty. Let w max = r max- 

Assumptions U.V: Variability penalty 

U.V.I: For each i € Af, Uj is well defined and continuously 
differentiable on an open set containing [0,u max ]. 
U.V.2: For each i E Af and any two elements x 1 and x 2 in 
any Euclidean space R d with x 1 7^ x 2 , and a G (0,1) with 
a = 1 — a, we have 



UY I 1 1 ax* + ax 

< aUY 



2||2 

Ix 1 !! 2 



(2) 



where |.| denotes the Euclidean norm associated with the 
space. 

U.V.3: For each i G Af, UY has strictly positive derivatives, 



i.e., mm 



*>6[0,tWx] (PY) 0) = d Xdn,i > °- 



Note that any (not necessarily strictly) convex function satisfies 
(2), but the condition is weaker than a convexity requirement. 
For instance, using triangle inequality, one can show that 
Ui(vi) = y/vi + 8 for 8 > satisfies all the conditions 
described above for any u max . This function is not convex 
but is useful as it transforms variance to (approximately) the 
standard deviation for small enough 5 > 0. We will later see 
that our algorithm (Section I-A) can be simplified if any of 

'We could allow the constant r max to be user dependent. But, we avoid 
this for notational simplicity. 



the functions XJY are linear. Hence, we define the following 
subsets of Af: 

Mi = {i G Af : UY is linear} , 
Af n = {i G Af : UY is not linear} . 

Next we discuss the assumptions on the functions 
(Uf) i£jl j- used to impose fairness associated with the QoE 
across users. Recall that the proxy for QoE for user i is 

dit) = rrnit) - UY (Vi(t)) and, let e minil = -C/f(u max ) 
and e maXiJ = r max - UY (0). 



Assumptions U.E: Fairness in QoE 

U.E.I: For each i G Af, Uf is defined and continuously 
differentiable on an open set containing [e m i n> i, e max 



U.E.2: For each i G Af, Uf is concave on [e 
U.E.3: For each i G Af, Uf is strictly increasing, i.e., 

) (which ensi 
strictly positive on [e min ,i, e roaX)i ]) 



mm,z j ^max,?J 



Uf) (emax,i) > (which ensures that the derivative is 



For each i G Af, although Iff has to be defined over an 
open set containing [e m i n .j, e max .j], only the definition of the 
function over [e m i n o,i, e maXj i] affects the optimization where 
emino.j = — UY {0)- This is because we can achieve this value 
of QoE for each user just by allocating Ol^v in each slot. 
Thus, we can choose any function from the following class 
of strictly concave increasing functions parametrized by a E 
(0,oo) ([19]) 



U a (e) 




if a = 1, 
otherwise, 



(3) 



and can satisfy U.E by making minor modifications to the 
function. For instance, we can use the following modification 
jjEdog of the log function for any ( sma ii) s > 0: U E ' log (e) = 

log (e - e min ,i + S) , e G [e milM , e max ,i]. The above class of 
functions are commonly used to enforce fairness to obtain 
allocations that are a— fair (see [24]). A larger a corresponds 
to a more fair allocation that eventually becomes max-min fair 
as a goes to infinity. 

Our focus in this paper is on obtaining an algorithm for 
reward allocation that can be implemented at a centralized 
coordinator that has access to Q at the beginning of slot t. 

Scope of the model 

Time varying capacity constraints: We close this section 
by illustrating the wide scope of the framework discussed 
above by describing examples of scenarios that fit it nicely. 
The presence of time varying constraints ct (r) < allows us 
to apply the model to several interesting and useful settings. 
In particular, we discuss three wireless network settings WN, 
WN-E and WN-T, and show that the model can handle prob- 
lems involving time varying exogenous constraints and time 
varying utility functions. We start by discussing WN where 
the reward in a slot is the rate allocated to the user in that 
slot. Let V denote a finite (but arbitrarily large) set of positive 
vectors where each vector corresponds to a peak transmission 
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rate vector for a slot seen by users in a wireless network. 
Let C = {c p : c p (r) = £ ieJv - * - 1, p e p}. Here, for 
any allocation r, rj/pj is the fraction of time the wireless 
system needs to serve user i in slot i to deliver data at the 
rate of r, to user i in a slot where the user has peak data 
transmission rate pi. Thus, the constraint c p (r) < can 
be seen as a scheduling constraint that corresponds to the 
requirement that the sum of the fractions of time that different 
users are served in a slot should be less than or equal to one. 
We can verify that by choosing r max = max P ep maxigjVPt 
and <5f eas = 577 min pe -p mirii & fy r f>i> we satisfy C.2-C.5a. 

Time-varying exogenous constraints: We can also 
allow for time varying exogenous constraints on the 
wireless system by appropriately defining the set C. For 
instance, consider case WN-E where a base station in a 
cellular network allocates rates to users some of whom 
are streaming videos. As pointed above, the QoE of users 
viewing video content is sensitive to temporal variability 
in quality. But, while allocating rates to these users, we 
may also have to account for the time varying resource 
requirements of the voice and data traffic handled by the 
basestation. We can deal with this constraint by defining C = 

{>,/) : c (p,/) W = E ieA r |-(i-/),peP,/G7}, 

where J 7 is a finite set of real numbers in [0, 1) where 
each element in the set corresponds to a fraction of 
a slot's time that is utilized by the voice and data 
traffic. Let f max = max/g^/. Then, we can verify 
that by choosing r max = max pe -p maxjgjv' pi and 
^feas = Tn min pe-p mhi ieA rPi (1 - /max), we satisfy 
C.2-C.5a. 

Time varying utility functions: For users streaming video 
content discussed in the case WN-E, it is more appropriate 
to view the perceived video quality of a user in a slot as the 
reward for that user in that slot. However, for users streaming 
video content, the dependence of perceived video quality (in a 
short duration slot roughly a second long which corresponds to 
a collection of 20-30 frames) on the compression rate is time 
varying. This is typically due to the possibly changing nature 
of the content, e.g., from an action to a slower scene. Hence, 
the 'utility' function that maps the reward (i.e., perceived video 
quality) derived from the allocated reward (i.e., the rate) is time 
varying. This setting, referred to as WN-T, can be handled as 
follows. Let qt t i (wi) denote the strictly increasing concave 
function that, in slot t, maps the perceived video quality to 
the rate Wi allocated to user i. For each user i, let Qi be a 
finite set of such functions. Hence, we can view WN-T as a 
case that has the following set of constraints: 

C = (c( P , q) : c (p>q) (r) = Y, ~ 1. 

pef.fteQsVie Af} . 

Note that each element in C is a convex function. If we 
assume that each function q £ Q is differentiable and convex 
with q(0) = (which are very reasonable assumptions on 
the dependence between quality and compression rate), then 
we can verify that by choosing a small enough <5f eas and 



r nra = maxpgp max ieA r max 9(E g 9 (p,), we satisfy C.2- 
C.5a. 

To summarize, our framework allows substantial freedom in 
modeling temporal variability in both the available resources 
and the sensitivity of the users' reward/utility to their alloca- 
tions, as well as fairness across their QoE. 

III. Optimal Variance-Sensitive Offline Policy 

In this section, we study OPT(T), the offline formulation for 
optimal joint reward allocation introduced in Section I. In the 
offline setting, we assume that (c) 1 . T , the realization of the 
process (C) VT , is known. We denote the objective function 
of OPT(T) by </> T , i.e., 

^ ((r) 1:T ) = J2 U n^J2 *W - U i ( VarT ((^)i:t))) 

i£j\f V t=l / 

and (llf) ieJ ^ and {pY) ieA f are functions satisfying U.E 
and U.V respectively, and (recall that) Var T ({ri)^) = 

T J2t=i ( r iW _ T E^i 7 'i( T )) ■ Hence the optimization 
problem OPT(T) can be rewritten as: 

max (f> T (0)^) (4) 

( r )l:T 

subject to c t (r(t))<0 V t £ {1, T} , (5) 
r i (t)>0Vt£{l,...,T},\/i£N', (6) 

where c t £ C is a convex function for each t. 

The next result asserts that OPT(T) is a convex optimization 
problem satisfying Slater's condition (Section 5.2.3, [4]) and 
that it has a unique solution. 

Lemma 1. OPT(T) is a convex optimization problem satisfy- 
ing Slater's condition with a unique solution. 

Proof: Since we made the assumptions U.E and U.V, the 
convexity of the objective of OPT(T) is easy to establish once 
we prove the convexity of the function L/^Var 7 (.)) for each 
i £ Af. Using (2) and the definition of Var T (.), we can show 
that [/^(Var T (.)) is a convex function for each i £ Af. The 
details are given next. For two different quality vectors (r 1 ) 
and (r 2 ) 1 „, any i £ Af, a £ (0, 1) and a = 1 — a, we have 
that 

Var T (a [r]) VT + a {rf) 1:T ) 

= Var T ((arl +ar*) 1:T ) 

= 4e ((«•*(*) +*f(*)) 

t=l 

1 T \ 2 

r=l / 
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Using (2), we have that 
^(Var T («(r 4 1 ) 1:T + a(r 4 2 ) 1:T )) 



and thus, 



r}{t) = r1{t)\f l<t<T, V i e TV. 



aW 



+aU 



= aUY (Var^ ((r*)^)) + 5X1? (Var^ ((rf) 1:T )) . 

Thus, UY (Var T (.)) is a convex function. Thus, by the con- 
cavity of XJf (.) and -C/^(Var T (.)), we can conclude that 
OPT(T) is a convex optimization problem. 

Note that, from (2) (since we have a strict inequality), the 
inequality above is a strict one unless 

T „ T 



From the above discussion, we can conclude that OPT(T) has 
a unique solution. ■ 
We let (r T ) 1 . T denote the optimal solution to OPT(T). 
Since OPT(T) is a convex optimization problem satisfying 
Slater's condition (Lemma 1), the Karush-Kuhn-Tucker (KKT) 
conditions ([4]) given next are necessary and sufficient for 
optimality. Let mf = ^ Ym=i r I (*)■ 

KKT-OPT(T): 

(r T ) 1 „ is an optimal solution to OPT(T) if and only if it is 
feasible, and there exist non-negative constants (v T ) 1 . T and 
("ff : i G N) VT such that for all i G N and t G {1, ...,T}, 
we have 



rl(t) = rUt) + rl(r) - ± £ r 2 (r) Vl<t<T. (f/f )' Uj^ r J {t) v v ( Var ((ff ) )) 

r — 1 t— 1 \ , | 



Thus, for the inequality not to be a strict one, we require that 

VarT ((^)l:T) = VarT ((' 



« /1:7V 



Further, Slater's condition 
is satisfied and it mainly follows from the assumption C.5. 

Now, for any i G TV, Uf and -L/^(Var T (.)) are not 
necessarily strictly concave. But, we can still show that the 
objective is strictly concave as follows. Let (r 1 ) 1T and 
(r 2 ) 1 T be two optimal solutions to OPT(T). Then, from the 
concavity of the objective, (a ( r i) 1 . T + ot is also an 

optimal solution for any a € (0, 1) and a = 1 — a. Due to 
convexity of UY (Var T (.)), this is only possible if for each 
i G AT and 1 < t < T, UY (Var T (a ( r J) + a (r?) )) = 
aUY (Var^ ((rj) VT )) + aUj (Var T ((r|) 1:T )). 

From the above discussion, 



rj{t) 2{UY) (Var((rf) 1:T )) 



T 



M T (t) 
T 



T 

,.Tn\„ i 



(rf(t)-mf) 



0, 


(7) 


0. 


(8) 


0, 


(9) 



Here c' t i denotes and we have used the fact that for any 
i£jV and t G {1,1, T} 



to 



UY (Var T (a (rj) ^ + a (r?) 1>T )) is equal 
a^(Var T ((rl) 1:T )) + aC/f (Var T ((rf) 1:T )) for each 
i G TV only if Var T ((r|) 1>T ) = Var T ((rf) X . T ) for each 

* G TV , and r*(t) = r?(t) + > Er=i - f Er=i »f (t) 
for each i e Af and 1 < t < T. Since for each i G A/", 
Var T T ) = Var T ((r 2 ) 1T ), due to optimality of 

(r 1 ) and (r 2 ) x . T , we have that 
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dn{r 



(TVar T ((n) 1:T )) = 2 L(r') - 1 f>( T )) 



e^Ue- 5 

ieN \ t=i 



rt{t)-UY{NM T {{rl) l:T )) 



U 



E 



\ t=i 



+ ^E^(r) - - ( VarT (WW)) 

r=l r=l / 

Since [/^ are strictly increasing functions for each i G A/", the 
above equation implies that 



1 



T 

tEX' 

r=l 



(r) = i^r 2 (r), 



From (7), we see that the optimal reward allocation r T (i) in 
any time slot t depends on the entire allocation (r T ) 1T only 
through the following three quantities associated with (r T ) r 

: (i) time average reward m T , (ii) (^(Uf) J evaluated 
at the quality of experience of the respective users, (iii) 
^(J7j V ) J evaluated at the variance seen by the respective 
users. So, if a genie were to reveal only these time average 
quantities associated with the optimal solution, the optimal 
allocation for each slot t could be determined by solving an 
optimization that only requires knowledge of c t (associated 
with current slot) and not entire (c) 1T . We exploit this key 
idea in formulating our online algorithm in the next section. 



IV. Adaptive Variance aware Reward allocation 

The reward allocations for AVR are obtained by solving 
OPTAVR(m, v, c) given below: 

max J2 (U?) (e,) (r, - (C^)' (r< - m 4 ) 2 ) 

subject to c(r) < 0, (10) 
fj > V i G AT, (11) 
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where e, = nij — Vie A/". Here m, v and e 

correspond to current estimates of the mean, variance and QoE 
respectively. Note that OPTAVR(m, v, c) is closely related to 
OPT-ONLINE (discussed in Subsection I-A). Let r* (m, v, c) 
denote the optimal solution to OPTAVR(m, v, c). 

Next, we describe our AVR algorithm in detail. AVR 
consists of three steps, AVR.0-AVR.2, given next: 

Adaptive Variance aware Reward allocation (AVR) 

AVR.0: Initialization: let (m(l),v(l)) € H. 

In each slot t for t > 1, carry out the following steps: 
AVR.l: The reward allocation in slot t is the optimal solution 
to OPTAVR(m(t),v(i),c t ), i.e., r* (m(t), v(i), c t ), and will 
be denoted by r* (t) (when the dependence on the variables is 
clear from context). 

AVR.2: In slot t, update rrii as follows: for all i E Af, 



1 



™ l {t) + -{r*(t)-m l (t)) 



m t (t+l) = 
and update Vi as follows: for all i € Af, 

(r*(t)- mi (t)) 2 



Vi(t+1) = 



Vi{t) 



i(t) 



(12) 



(13) 



J o 



Here, [x] h a = min (max (x, a) , b). We see that the update 
equations (12)-(13) roughly ensure that the parameters m(t) 
and v(t) keep track of mean reward and variance in reward 
allocation respectively associated with the reward allocation 
under AVR. Also, note that we do not have to keep track of the 
estimates of variance of users i with linear XJY since OPTAVR 
is insensitive to their values (i.e., (UY) (.) is a constant), and 
thus the evolutions of m(t) and (vi(t)) ieJv - does not depend 
on them. We let 9(t) = (m(i),v(i)) for each t. The update 
equations (12)-(13) ensure that 9(t) stays in the set H given 
by: 



H 



[0,r n 



[0,v n 



lJV 



where x denotes cross product operator for sets. 

For any (m, v, c) £ H x C, we have 

(Uf)' (rrn-uy (vi)) (uY) ' («<) > for each i G Af 
(see assumptions U.E and U.V). Hence, OPTAVR(m, v, c) is 
a convex optimization problem with a unique solution. Further, 
using Assumption C.5, we can show that it satisfies Slater's 
condition. Hence, the optimal solution for OPTAVR(m, v, c) 
satisfies KKT conditions given below. 

KKT-OPTAVR(m, v, c): 

There exist non-negative constants pj* and (7* : i € Af) such 
that for all i G Af 

(Uf)' (m, - UY («,)) (r* - 2 (U? )' ( Vi ) (r* - m,)) 



M*c(r*) 

7*< 



0, (14) 
0, (15) 
0. (16) 



In the next lemma, we establish continuity and differen- 
tiability properties of r* (m, v, c) (also denoted by r* in the 
result) when viewed as a function of (m,v). 

Lemma 2. For any c £ C, and 9 = (m, v) £ Ti 

(a) r* (9,c) is a continuous function of 9. 

(b) E [r* (0,C 7r )] is a continuous function of 9. 

Proof: Proofs of parts (a) and (b) mainly rely on some 
fundamental results on perturbation analysis of optimization 
problems from [6] and [2]. Part (a) can be proved using 
Theorem 2.2 in [6]. Part (b) can be shown using (a), and the 
Bounded Convergence Theorem (see [9]). ■ 
The next Theorem states important results related to the 
convergence of the mean, variance and QoE of the reward 
allocations under AVR. This result will be proven in Section 
V. For brevity, we let r*(t) denote r* (m(t), v(t), Ct). 

Theorem 1. For almost all sample paths, and for each i £ Af, 



(a) 
(b) 

(c) 



1 T 
lim — > r* (t) 

T = l 



lim mi(t), 

t—>oo 



lim Var ((r*) 1:T ) = lim Vi (t), 

T— >oo t— >oo 



\ t=i 

= lim ( mi (t)-UY( Vi (t))) 



(W) 1:r )) 



Asymptotic Optimality of AVR: 

The next result establishes the asymptotic optimality of AVR, 
i.e., if we consider long periods of time T, the difference in 
performance of AVR and the optimal offline policy OPT(T) 
becomes negligible. 

Theorem 2. The allocation (r*) 1 . T associated with AVR is 
feasible, i.e., it satisfies (5) and (6). Also, for almost all sample 
paths AVR is asymptotically optimal, i.e., 



lim 

T->oo 



((r*) 1:r )-M(r T ) 1:T ))=0. 



Proof: Since the allocation (r*) 1 . T associated with AVR 
satisfies (10) and (11) in each time slot, it also satisfies (5) 
and (6). 

To show asymptotic optimality, consider any realization of 
{cj^rp. Let {p*) 1 . T and (7* : i e Af) 1 . T be the sequences of 
non negative real numbers satisfying (14), (15) and (16) for the 
realization. Hence, from the non-negativity of these numbers, 
and feasibility of (r T ) 1 T , we have 

M(r T ) 1:T ) < M(r T ) 1:T )- 

where 

M(' T ) 1:T ) 

= E^f^E^)-^(Var T ((rf) 1:T ))) 

leN' \ t=i J 



The function ifT is the Lagrangian associated with OPT(T) 
but evaluated at the optimal Lagrange multipliers associated 
with the optimization problems (OPTAVR) involved in AVR, 
and hence the inequality. Since tp T is a differentiable concave 
function, we have (see [4]) 



holds for almost all sample paths. From the optimality of 



if T ((i 



'1:7V 



< 



¥>T((r*) 1:5 



+ V^ T ((r*) 1:T ).((r T ) 1:T -(r*) 1:T ) 
where '•' denotes the dot product. Hence, we have 



4> T ((l 



' 1:T- 



< <PT ((l 



' 1:7V 



^ E^f^E^w-^( VarT («)i:T))) 



/=i 

T 



t=l ieW 



t=l ieAf 



+EE( r 'w-<w)(-^M(^w) 



T 



It it) 
T 



2(UY) (Var r («) 1:T )) 



T 



r*(t) 




V Jut' 

0r((r T ) 1:T ) > 0x ((r*) 1:T ) . 
From the above two inequalities, the result follows. ■ 

V. Convergence Analysis 

This section is devoted to the proof of Theorem 1 which 
captures the critical convergence properties of reward alloca- 
tions under AVR. We start the section by studying another 
optimization problem OPTSTAT closely related to OPT(T). 

A. A stationary version of OPT: OPTSTAT 

The formulation OPT(T) mainly involves time averages of 
various quantities associated with it. Instead, the formulation 
of OPTSTAT is based on the expected value of the corre- 
sponding quantities evaluated under the stationary distribution 

of (C t ) t . 

Recall that (see C.l) (C t ) t is a stationary ergodic process 
with marginal distribution (tt(c) : c £ C), i.e., for c £ C, w(c) 
is the probability of the event Ct = c. Since C is finite, we 
assume that ir(c) > for each c £ C without any loss of 
generality. Let (r (c)) cgC be a vector (of vectors) representing 
the reward allocation r (c)(£ M. N ) to the users for each c £ 
C. Although we are abusing the notation introduced earlier 
where r(t) denoted the allocation to the users in slot t, one 
can differentiate between the functions based on the context 
in which they are being discussed. Now, let 



Now, since (n*) 1 . T and (7* : i £ N) VT satisfy (14), (15) and 
(16), terms in the third line above cancel, and we have 



0* ((r ( c )) ce c) 



(j) T ((1 



'UTJ 



ieM \ t=i / 



(17)= E^ (T,<c)n(c)-ur (V^ ((nic))^))) , 

ieM \cec ) 



where 



T 

EE 

t=l ieM 



rf(t)-r*(t) 



T 

T 



Var^ ((r< (c)) ceC ) = £>(c) L(c) E ■ 

We define the 'stationary' optimization problem OPTSTAT as 
follows: 



(rKt) 



max 

(r(c)) cec 



-2 (U?) (Var T ((r*) 1:T )) ( r?(t) - - E r?(r) 



(C/f) ( ei (t-l))(r*(t) 



From Theorem 1 (a)-(c), and the continuity and bound- 
edness of the functions involved, we can conclude that the 
expression appearing in the last five lines above can be made 
as small as desired by choosing large enough T and then 
choosing a large enough t. Also, \rj{t) — r*(t)\ < r max for 
each i £ Af. Hence, taking limits in (17), 

r lim (M(r*) 1:T )-M(r T ) 1:T )) > 0. 



(( r ( c )) ceC ) 1 

subject to c(r(c)) < 1, V c £ C, 

r 4 (c) > 0, V i £7V, V c£ C. 

The next lemma gives a few useful properties of OPTSTAT. 

Lemma 3. (a) OPTSTAT is a convex optimization problem 

satisfying Slater's condition. 

(b) OPTSTAT has a unique solution. 

Proof: The proof is similar to that of Lemma 1, and is 
easy to establish once we prove the convexity of the function 
Var^.). ■ 
Using Lemma 3 (a), we can conclude that KKT conditions 
are necessary and sufficient for optimality for OPTSTAT. Let 
(r w (c) : c £ C) denote the optimal solution. 
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KKT-OPTSTAT: 

There exist constants (fi 71 (c) : c G C) and 
((7? ( C )W : c G C) are such that 

7T (c) (rf (c) 

-2 (C^)' (Var* ((r? (c)) c£C )) f r? (c) - £ tt (c) r? (c)^j J 



-/i-(c)c i (r' r ( C )) + 77(c) = ) (18) 
^ (c) c(r- (c)) = 0, (19) 
7 f (c)rf(c)=0, (20) 



where c, denotes J 2 - 



In developing the KKT conditions, we used following result: 

for any c £ C, i £ A/", 



dVar* ((r< (c)) eec ; 
9rj (c ) 



= 27r(c ) rj(c ) - 7r(ci)rj(ci) 



Next, we find relationships between the optimal solution 
(r 77 (c) : c G C) of OPTSTAT and OPTAVR. To that end, let 

< - Ecec * (c) r? (c), = Var^ ((rf (c)) ceC ) and ef = 
mj - Jjy {v?)for each i G A/". Next, let 

H* = {(m, v) G H : (m, v) satisfies (21) - (22)} , 

where the conditions (21)-(22) are given below: 

E[rt(m,v,C*)]=mi V i G AT, (21) 
Var (r* (m, v, C 77 )) = ^ V i G A/"; (22) 

Recall that r* (m, v, c) is the optimal solution to 
OPTAVR(m,v,c). 

The next result gives important properties relating (m 7r , v") 
to the set H*. A proof is given in Appendix A. 

Theorem 3. (m T ,v") satisfies the following: 
(a) r* (m 71 ", v T , c) = r w (c) /or eac/z c € C, anof 
W "H* = {(m^.v^)}. 

In the above discussion, we identified several interesting 
relationships between OPTSTAT and OPTAVR, and identified 
some properties of the vectors m", v 77 and e". Next, we use 
these to study a differential equation that mimics the evolution 
of the parameters in AVR. 

B. Dynamics of OPTAVR 

In this subsection, we focus on establishing convergence of 
the following differential equation 

dB{r) 



dr 



;(0(t))+z(0(t)) 



(23) 



for r > with 0(0) G H where g (6) is a function taking 
values in M. 2N defined as follows: for 9 — (m, v) G %, let 

{g{0)) i = E[r*(e,C")]- mi , 



(1(0)) 



N+i 



E 



In (23), z (6) G — C-h (0) is the projection term which is the 
smallest vector that ensures that the solution remains in % (see 
Section 4.3 of [16]). The set C-h (9) contains only the zero 
element when 9 is in interior of H, and for 9 on the boundary 
of the set %, C-h (9) is the infinite convex cone generated by 
the outer normals at 9 of the faces of T-L on which 9 lies. 
The motivation for studying the above differential equation 
should be partly clear by comparing the RHS of (23) with the 
update equations in (12)-(13) in AVR, and we can associate 
the term z (9) with the constrained nature of those update 
equations. The following result tells us that z (9) appearing in 
(23) is innocuous in the sense that we can ignore it when we 
study the differential equation. The proof given in Appendix 
B shows the redundancy of the term z (9) by arguing that the 
differential equation itself ensures that 9(t) stays inside H. 

Lemma 4. For any 9 en, z 3 (9) = for all l<j< 2N. 

We say that an allocation scheme (r(c)) cgC is feasible if 
for each c G C, c(r(c)) < 0. We define the set % as follows: 

H = {(mi,vi) : there is some feasible allocation scheme 
(r(c)) cec with E [n (C)] = m u , 



E 



(n (C*)-E[n (C*)])' 



< vu < r max V i G 



We could roughly think of T-L as the set of all 'achievable' 
mean variance pairs. Here, the restriction Vu < r^ ax for each 
i mainly ensures that H is bounded. Further, for any 9\ = 
(mi, vi) G H, let 



n(9x) 



{(r(c)) ceC :E[r t (C w )]=m M 



E 



(n (C*) - E {n (C*)}) 2 ] < v u VieAf] 



We can view 7Z(9i) as the set of all feasible reward allocations 
corresponding to an achievable 9\ G H. The following result 
characterizes several useful properties of the sets introduced 
above; a proof is given in Appendix C. 

Lemma 5. (a)lZ {9\) is a non-empty compact subset o/l JV ' c ' 

for any 9\ — (mi,vi) G H. 

(b) % is a bounded, closed and convex set. 

In the next result, we establish a convergence result for the 
ODE in (23). The proof relies on the optimality properties of 
the solutions to OPTAVR, a result from [28], Theorem 3 (b), 
and the following Lyapunov function 



L{9) 



iGTV 



( mi -UY (vi))+r)d(6,H 



(for an appropriate choice of 77 > 0) where (abusing notation 
for function d) d(d,H) = mi 0ieji d(9,9 1 ), and d(9,9 1 ) 
measures the Euclidean distance between the points 9 and 9\ 
in M. 2N . Note that L (9) takes smaller values if 9 is achievable 
(and hence d^9,7i) = 0), and if the current estimates m 
and v of mean ana variance correspond to high values for 
the objective of OPT(T). This Lyapunov function will in fact 
be shown to be non-increasing for 9(t) evolving as (23), 
and is bounded below. This gives a convergence result and 
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uniqueness is argued using Theorem 3 (b). A detailed proof 
is given in Appendix D. 

Theorem 4. Suppose 0(t) evolves according to the ODE (23). 
Then, for any initialization 9(0) G H, lim t _ i . 00 9(t) = 6" . 

C. Convergence properties of AYR 

In this subsection, we discuss the proof of Theorem 1. 
We first establish a convergence result for (9(t)) t using the 
convergence result for the differential equation (23). We do 
so by viewing (12)-(13) as a stochastic approximation update 
equation, using a result from [16] that helps us to relate it 
the ODE (23), and establishing the desired convergence result 
by utilizing the corresponding result obtained for the ODE in 
Theorem 4. A detailed proof of the result is given in Appendix 
E. 

Lemma 6. If 9(0) G %, then the sequence (6(t)) t generated 
by AVR converges almost surely to d 7 * ' . 

Now we can prove Theorem 1 mainly using Lemma 6, 
and stationarity and ergodicity assumptions. The detailed ar- 
guments are given in Appendix F 

VI. Conclusions 

This work presents an important generalization of NUM 
framework to account for the deleterious impact of temporal 
variability allowing for tradeoffs between mean, fairness and 
variability associated with reward allocations across a set of 
users. We proposed a simple asymptotically optimal online 
algorithm AVR to solve problems falling in this framework. 
We believe such extensions to capture variability in resource 
allocations can be relevant to a fairly wide variety of systems. 
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Appendix A 
Proof of Theorem 3 

Proof: By KKT-OPTSTAT (r K (c) : c G C), 
(pi* (c) : c G C) and (( 7 f (c)) ieM : c G C) satisfy (18)- 
(20). To show that r* (xii* , , c) = (c), we verify that 
r*(c) satisfies KKT-OPTAVR(m 7r , v 1 , c). To that end, we 
can verify that r* (m'^'.c) along with p* = ~7~r an d 

(jt = : 1 e satisf y (!4)-(16) by using (18)-(20). 
This proves part (a). 

We prove part (b) by establishing (i)-(iv) listed below. First 
we show (i) (m 7r ,v' r ) G H*. Then, assuming (mi, vi) E H*, 
we show that 

(ii) (r* (mi, Vi, c)) cgC is an optimal solution to OPTSTAT. 
Finally, assuming that (m 2 ,v 2 ) G W, we show that 

(iii) r* (mi,Vi,c) = r* (m 2 ,v 2 ,c) for each c G C, and 

(iv) ran = m,2i f° r eacn * £ A/\ and vu = w 2 i for each i G N. 
Note that part (b) (the uniqueness of solution to (21)-(22)) 

follows from (i) and (iv). In the following, we prove (i)-(iv). 

Firstly, note that (i) follows from (a) and the definitions of 
m w and v 77 . Next, we prove (ii). For each c G C, r* (m x , v l5 c) 
is an optimal solution to OPTAVR and thus, there exist non- 
negative constants p\ (c) and (7^ (c) : i G Af) such that for 
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all i € TV, satisfy KKT-OPTAVR given in (14)-(16), i.e., 

(UF) (eu) (r? c) - 2 (C/f )' (r* (0 1>c ) - m H ) 

+7u-^(c)c;(r*(0i,c)) = 0, 
^(c)c(r*(0 1)C )) = 0, 

7*i r i( ij c ) = °> 

where 6>i = (mi,vi), and en = mi^ — fZ,- (i>ii) for each 
% G A/". Since (mi, Vi) G "H*, it satisfies (21)-(22). Combining 
these observations, we can rewrite the above equations as 
follows: for all c G C 

(C/f)' (-E [r* (0 1; C-)] - ^ (Var* (r* (9 U C*)))) 

(r* (0i, c) - 2 (C^)' (Var" (r* (fli.C))) (r* (9 1 ,c) 

-E[r* (e^Cnm+lu- ^(c)c' t (r* (O^c)) = 0, 

M t(c)c(r*(0i,c)) = 0, 
7l V*(0i,c) = 0. 

Now for each c G C, multiply the above equations 
with 7T (c) and one obtains KKT-OPTSTAT (Q8)-(20)) with 
(tt (c) nl (c) : c G C) and ((tt (c) 7^ (c)) igA/ - : c € C) as asso- 
ciated Lagrange multipliers. From Lemma 3 (a), OPTSTAT 
satisfies Slater's condition and hence satisfying KKT condi- 
tions is sufficient for optimality for OPTSTAT. Thus, we have 
that (r* (mi, vi, c)) cgC is an optimal solution to OPTSTAT. 
This proves part (ii). 

Now suppose that (mi, Vi) , (ma, V2) G W, and suppose 
that for some c.q G C and i G Af, r* (mi, vi, Co) 7^ 
r* (m2,V2,Co). Thus, using this together with part (ii), we 
have that (r* (mi, vi, c)) cgC and (r* (ni2, V2, c)) cgC are two 
distinct solutions to OPTSTAT. However, this contradicts fact 
that OPTSTAT has a unique solution (see Lemma 3(b)). Thus, 
(iii) has to hold. 

Now suppose that (mi, vi) , (ni2, V2) G H*. and that (iv) 
does not hold. Then, we can conclude that at least one of 
the conditions given in part (iv) does not hold. For instance, 
suppose that Vij 7^ v 2 j for some j G Af. This along with the 
fact that (mi, Vi) , (ni2, V2) G %* (and thus they satisfy (22)) 
implies that Var (r* (mi,vi,C" r )) 7^ Var (r* (m 2 , v 2 ,C)). 
Thus, we can conclude that for some cq G C, r* (mi, Vi, Co) 7^ 
r* (m2,V2,co) which contradicts part (iii). We can reach the 
same conclusion if any other condition given in (iv) is violated. 
Thus, (iv) has to hold. ■ 

Appendix B 
Proof of Lemma 4 

Proof: Recall that H = [0,r max ] N x [O.tw]* and 
■Umax = ''max- F° r anv 9 E H, using the fact that < 
r* (0,C" r ) ,rrii < r max for each i G Af, we can show that 
Zj (0) = for any j such that 1 < j < N. Similarly, since 
o max = r max , we can show that Zj (9) = for any j such 
that N + 1 < j < 2N. M 

Appendix C 
Proof of Lemma 5 

Proof: For any Oi G H, using the definition of 
H, we see that TZ{6i) is a non-empty set. The set 



{r G M. N : c(r) < 0, r, > V i G A/"} is closed due to con- 
tinuity of the functions c G C, and thus the set of feasible 
allocations, i.e., Tl ceC {r G R N : c (r) < 0, r t > V i G Af} 
is closed. Now, note that TZ(0i) is the intersection of the 
(closed) set of feasible allocations, and inverse images of 
closed sets associated with continuous functions defined over 
the set of feasible allocations. Thus, 7Z(6i) is closed. Also, 
1Z (Oi) is bounded, and hence compact. This proves (a). 

H is bounded since < mi, < r max and < Vu < r max 
for each i G Af, and each (mi,vi) G H. 

Let ^m,v^ be any limit point of %. Then, there 



exists a sequence ((m„,v n )) ng 
lim IWOO (m„, v„) = (m',v 



c 
Let 



%, such that 



M<0) 



B(K,vJ) for each n G Z+. Then, ((r«(c)) cgC ) ngZ+ 
is a sequence in the set of feasible allocations which is 
a compact set. Thus, it has some convergent subsequence 
((r„ fc ( c )) c6 c)fcgz anc ' suppose that the subsequence con- 
verges to a feasible allocation ( r (c) ) . Then, 

V / cec 



E 
Also, 



= lim E [r 7 - 

k—>oo 



lim m ri 



E 



r< (C) - E 



lim 

ft— >oo 

lim v„ 



(r nk i (cn 



E[r nkl {cn\r 



< 



Thus. 



r (c) ) G 1Z 



m , v ) ) , and hence, ( m , v ) G 



H. Similarly, we can show that % contains all its limit points 
and hence is closed. 

To show convexity, consider (mi, Vi) , (m.2, V2) G H, 
and we show that for any given a G [0,1], we 
have a(mi,vi) + (1 — a)(m.2,V2) G H. Let 
(ri(c)) 



cec 



cec 
2 



G 
< 



6 K((m llVl )) and (r 2 (c)) 
7t((m a ,v 2 )). Hence, B [(ry (C*) - £ [ru (C w 

E [(r 2i (C*) - E [r 2i (C*)}) 2 ] < v 2l V i G /V. Let 
r3(c) = ari(c) + (1 — a) r 2(c). Thus, for each i G Af, 

E[r 3i (C*)]=am 1 + {l-a)m 2 . (24) 

Further, using convexity of the Variance (proved earlier), we 
have that for each i G A/", 



E 



< aE 



(r u (C*)-E[r u (C*)])' 



+ (l-a) J E[(r2i(C w )-£[r 2i (C")])' 
< a«ii + (1 — a) i>2i- 



(25) 



Thus, from (24) and (25), we have that (r3(c)) cgC G 
1Z (a (mi, vi) + (1 — a) (m 2 , v 2 )), and thus a(mi,Vi) + 
(l-a)(m 2 ,v 2 ) en. M 
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Appendix D 
Proof of Theorem 4 

Proof: We prove the result using LaSalle's invariance 
principle (see Theorem 4.4 from [13]). To use this result, 
we need to show that T~L is positively invariant with respect 
to the ODE (23), and we need a continuously differentiable 
Lyapunov function L : H —> K such that 



dL(8( T )) J<o, 9en\{9 7r }. 

1 = 0, 9 



(It 



(26) 



Using arguments similar to that in the proof of Lemma 4, 
we can show that % is positively invariant with respect to 
ODE (23). Now, we show (26) by considering the following 
(continuously differentiable) Lyapunov function 



L(6) 
where 



J2u? (rm-uY {v^+^d (e,n 



'/ 



y/tt 



((' 



,d v max VN 



1. 



Let 9 denote the (unique) projection of 9 on to the 
bounded, closed and convex set H (see Lemma 5). Note 
that throughout this proof, we use ? to denote this pro- 
jection operator. Thus, for any 9 G H, d [9,1-1 



where (m, v) = 0. 



Also, note that for any (r(c)) ceC € 1Z we have m, 

E [r (C*)], and Vi> E [(r ( - £ [F, (C^)]) 21 

Given next is a corollary of the result in [28]. 
Lemma ([28]): Suppose that V is a convex bounded closed 
subset ofM. N . Suppose that a vector function (x(t)) t>0 , taking 
values in M. N , is Lipschitz continuous, satisfying the following 
differential equation for almost all r > 0: 

^x(r) = v(t) - x(r), v(r) G V. 

77ien, f/te distance d(x(r), V) between x(r) f/ze sef V is 
a Lipschitz continuous non-increasing function, and moreover, 
for almost all t > 0, we have 



d_ 
dr~ 



d(x(r), V) < -d(x(r),V) 



Using the above result by setting V = H (which is a convex, 
bounded and closed set as shown in Lemma 5 (b)) and 
considering the ODE (23), we have 

^d(e{T),u)<-d(0{T),H). (27) 

Now, using (23), Lemma 4 and (27), we have that 
dL (9) 



Throughout this proof, as we have done above, we suppress 
the dependence of variables on time for brevity. Manipulating 
the above inequality, we have 



dL{9) 
dr 



< 



If we let 



I (9) 



dr 



< -E(^) (m-Ur(v t ))(E[r*(9,C^]- mi 



(t/f ) (rm - Ur (Vi)) (E [r*(d, C)] (28) 
-(UY) \vi)(E [(r*(9,C*)- mi f 

+ E ( U < E ) K - U Y (v^) (fhi (ur)' 

- e ( u n (™i - u y fa - m i) 

+ E Pi) K - UY ( Vi )) (Ur)' (vi) & - Vi ) 
-rjd (o,ii) ■ 



E(^y 

iSJV 

■{urn* 



m-VYM) (E [r*(0,C*)] 
E[(r*(9,C^~m l [ 



+ max _ 
(r(c)) cgc e£(e) 



C^(«i)) 



(UY) (vi)(E [(n(Cn-mi) 2 ]))), 



U 



(vi) (E [(r*(d,C*) - m,) 2 - - ^ -then the first term on the right hand side of (28) can be 
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expressed as follows 



Now let 



(U?)( Vi )(E [(rUe,C)-mi] 

I (9) - max ]T (™i - U i 

(?(c) )cec6 K(e) 

(e [n (C)] - (UY )' (vi) (e [(n (C*) - mi y 

I (9) - max Yl ("* ~ U i 

(?(c)) cec 6K(e) 

-{uY)\vi){E [(niC^-mif 



d E 

max 


max max (U E ) (eA , 

i&N e i e[e m ia, <1 e m „,i] 


d E - - 

mm 


min min ) (e») , 

ISA/ eiG[e mill ,i,e max ,i] 


d v 

max 


max max (?/,•) (Uj) , 

iG-A/ - »,£[0,u max ] V 1 7 


d v - - 

mm 


min min (UY) (vj). 


Noting that (using Assumptions U.V and U.E discussed in 
Section II) < d E lin < d E ax < oo and < < in < dj^ < 



r/r 



Z (0) - max J2 ( U * E ) ( m * ~ U i ^) 

(m, - (C/f)' („,) (E [(r, (C*) - E [r t (C«)]f 



oo, and using equivalence of norms, we have 

< 1 ( e ) + (m, m) + VN df^d (m, m) 

+<ax< ax ^ (v, v) - V d (e, n) 

jV 



^min^mn 



- £ )' - u i ^)) K) w K - s (^)i) 5 

= 1(0) - Y (Uf)' {mi - UY («i)) (m< - (t^)' («0 «j' 
+ £ (C/f )' (m, - („«)) (C^)' (m, - m 4 ) 2 

E\' f 1 ™ rrV 



max _ > (vi 

2l 



< 



I (0) - T7<i (0, Uj + \/2d (e, H) d 
max ((< ax + Viv) , dY^VN 



E 

max 



-d E - d v - 



max_ (^i 



max_ (U? 
(?(c)) cec eK(e) ieN 



E 



M - UY (Vi)) 
{uy)\f H )(y i -E[(jr i (C')-E^ i (C')]) 

Now, using above equation in (28), we have that 
dL (9) 



{n{C«)-E[n(C*)]y 



< l(0)-d[0,H )-l(0) 



where 



dT 



1(9) - Y (U E ) (m, - UY (vi)) (fhi - rm) 



+ £ {v? ) H - ^ (^)) (^) fa) - 



-rjd (0,H 



I (9)= max_ E(^- E 
(?(c)) cec eK(§) ie ^V 

Using the definition of and the optimality of 

(r*(9, c)) cgC , we can conclude that I (9) < for any 9 £ H. 

Also, from the definition of 7l(jij, we can conclude that 

-1(9) < 0. 

Next, we argue that <I ' L } = only for 9 = 9 W , and strictly 
negative for all other 9. Firstly, note that since satisfies 
(21)-(22) (see Theorem 3 (b)). Hence = for 9 = 9*. 

Next, we show that (^£1 <^ 1(e)- d (o, 

max ~/^E ( U * E ) { m i- U i (^))for any 9^9* such that 9 = 9. Let u: 

there exists some other 9 = ( m , v J such that Z I 

d(e',n\ - = 0- Since Z (e''\ < 0, -d(o' ,% \ < 

and —I [9 ] < 0, we can conclude that each of these terms are 



+ £(Cf) (m*-^^))^) (*)($-*) 



(^i V )' (?i - E [(n (Cn - E {n (C*)}) 2 } ) 
< Z (0) + £ { U ?) (™i - U Y M) \fhi - rrnl 



zero. In particular, Z (9 j = 0, 9 =9 and Z (9 j =0. Since 
9=9 and Z ( 9 ) = 0, we can conclude that there is some 



? [9 ,c 
and J5 



+ E(^f) (m^f/r^))^) (^)K-m,) 2 

+ £ (C/f )' (m, - («<)) (C^)' («<) |Bi - «i| 

-r?d - _ max £ (C/f )' (m 4 - («,)) 

(r(c)) cec eK(e) ieAf . g ^ Nqw us . ng fect ^ ^ 

(c^)' («< - s [(^ - E [n (C*)]f ' 



£ K\9 ) such that _E 



cec 



2 



= Vi for each 



= 0, and due the 

uniqueness of the optimal solution of OPTAVR(0 , c) for each 
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c G C, we can conclude that r ^0 , cj = r* , cj . Hence, 



we have that 



E 



E 



r* , 



E 



(n (9 



C" T \-E 



E 



,G*\-E 



r* o ,cr 



n (0 ,C 

= v. V i G Af. 



Hence, we see that satisfies (21)-(22), and thus 6 G "H* = 
{0^} (see Theorem 3 (b)). Hence, = T which gives a 
contradiction. 

Thus, we have shown that (26) holds, and now, the main 
result follows from Theorem 4.4 of [13]. ■ 

Appendix E 
Proof of Lemma 6 

Proof: We start proving the result by viewing (12)-(13) 
as a stochastic approximation update equation, and using 
Theorem 1.1 of Chapter 6 from [16] to relate ( 12)-( 13) to 
the ODE (23). 

In the following, we show that all the assumptions re- 
quired to use the theorem are satisfied. The following sets, 
variables and functions H, t , £ t , Y t , e t , sigma algebras 
Ft, (3 t , <5M t and the function g appearing in the exposition 
of Theorem 2.1, correspond to the following variables and 
functions in our problem setting: H = H, 6 t = (m(i), v(i)), 
£i = c t , for each i G M (Y t ) i = r*(t) — rrii(t) and 
{Yt) i+N = (r*(t) - m^t)) 2 - Vi(i), e t = \ for each t, F t is 
such that (0o> ^, i < t) is Ft -measurable, f3 t = and 

(5M t = for each t, (g ((m, v) , c)) j = r* (m, v, c) — m, and 
(9 ((m, v) , c)) f+iV = (r* (m, v, c) - m^ 2 ~ «i. 

The Equation 5.1.1 in [16] is satisfied due to our choice of 
et, and (A4.3.1) is satisfied due to our choice of H. Further, 
(A. 1.1) is satisfied as the solutions to OPTAVR are bounded. 
(A. 1.2) holds due to the continuity result in Lemma 2 (a). 

We next show that (A. 1.3) holds by choosing the function g 
as follows for each i G Af: (~g (m, v)), = E [r* (m, v, C*)] 



rrii, and (g(m,v)) i 



+N 



E 



(r* (m, v, C) 



mi) 



Note that the continuity of the function g follows from Lemma 
2 (b). 

From Section 6.2 of [16], if does not go to zero faster 
than the order of for (A. 1.3) to hold, we only need to show 
that the strong law of large numbers holds for (g (m, v, Ct)) t 
for any q. Strong law of large numbers holds since {Ct) t is a 
stationary ergodic random process and g is a bounded function. 
(A. 1.4) and (A. 1.5) hold since f3 t = and 8M t = for 
each t. To check (A. 1.6) and (A. 1.7), we use some sufficient 
conditions discussed in [16] following the theorem. (A. 1.6) 
holds since g is bounded. (A. 1.7) holds due to the continuity 
of g ((m, v) , c) in (m, v) uniformly in c which follows from 
the continuity result in Lemma 2 (a), and the finiteness of C. 
Thus, using Theorem 1.1, we can conclude that on almost all 
sample paths, (0(f))* converges to some limit set of the ODE 
(23) in %. From Theorem 4, for any initialization in H, the 
limit set only contains 0" ', and thus the main result follows. 



Appendix F 
Proving Theorem 1 using Lemma 6 

Here, we prove the claims in Theorem 1 using Lemma 6. 
Firstly, note that by Lemma 6 and the continuity of r* (0,c) 
(see Lemma 2 (a)), for any c G C, 



lim r* (0(i),c) = r* (0V). 



(29) 



Using this fact along with the ergodicity of (Ci) t , we have 
that for any realization (c t ) t of (Ct) t 

T T 

^rS^cF'^W.c) = r*(0V)Jim ¥ $> ct= 



tt(c) r* (0V) 



Since, r* (0(t), c t ) = J2 c ec I (c t =c)i'* (&(t), c) and C is a finite 
set, we can use the above equation to conclude that 



1 1 

r^rXyW*)' *) = r ^r£X, I (*=e) r *(0( f ). c ) 

c£C t=l 

= ^7r(c)r*(0 w )C ) 



^^(c)r^ (c) 



cGC 

= m 7 ' 

= lim m(t), 

t— f oo 



where the fourth equality follows from Theorem 3 (a) and the 
last equality follows from Lemma 6. This proves part (a). 
Next, we prove part (b). Note that for each i E N, 



Var T ((r*) 1:T ) 



(30) 



fEK(fl(nft)4E r <( fl ( T )^)] 

t=l \ r=l / 

i^(r*(0(O,c t )-r*(0(T),c t )) 2 



t=i 

T 



+^E( r *( W' c *)- r *( ( T )' c *)) 



wnctj-^Errwr),^)) 

r=l / 



Note that for any c G C, using (29), we can argue that 

T 



1 T 

-^(r*(0(i),c)-r*(0(T), C )r 



1=1 



can be made as small as desired by choosing a large enough 
T. Thus the limit as T goes to infinity is zero. Thus, using 



arguments similar to that used to prove part (a), we can show 
that 



t=l 

Similarly, we can argue 

T 



Wr), Ct )-if]r*(e(r), CT )^ =o. 



Using part (a) and (29), we can conclude that for any i E Af 
and c E C, 

r lim fr* (0(T),c) - ^E r * W T )> C ^ 

Using ergodicity of (Ct) t and above equation, we have that 
for any i E A/", 

T H ™ ? E ( r * (w> c *) - ^ E < w r )' c -)) 

t=l \ r=l / 

T / T 

& t E E A.-) rJ c) - ? E < 

t=l c6C \ t=1 



5>(c) (f*r,c)-m[) ! 
5>(c) (r? (c)-<) 2 . 



cGC 

= «7, 



where the third equality follows from Theorem 3 (a). 
Now, for each i E Af, using above observation and (30) 

lim Var T ((r*),. r ) 

= & ?E K ^W' C *) - ^E r * (*M><V)) 

t=l V 7=1 / 



= lim 

where the last equality follows from Lemma 6. This proves 
part (b). 

Part (c) follows from parts (a) and (b). 



